Search CORE

10 research outputs found

VThreads: A novel VLIW chip multiprocessor with hardware-assisted PThreads

Author: Agron
Andrews
Arvind
Brodersen
Chouliaras
Chouliaras
Chouliaras
Chouliaras
Colwell
Cong
D. Stevens
de Dinechin
De Micheli
Faraboschi
Gupta
Hubener
Kathail
Lin
Lin
Lübbers
Mandelbrot
Milward
Muck
Oliveira
Owaida
Papakonstantinou
Robert Thomson
Rooholamin
Schlansker
Stevens
Stevens
Thomson
Tullsen
V.A. Chouliaras
V.M. Dwyer
Villarreal
Watson
Windh
Ziavras
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Gigabyte per second streaming lossless data compression hardware based on a configurable variable-geometry CAM dictionary

Author: J.L. Nunez-Yanez
V.A. Chouliaras
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/01/2006
Field of study

Crossref

An efficient multiple precision floating-point Multiply-Add Fused unit

Author: Manolopoulos K. Reisis, D. Chouliaras, V.A.
Publication venue
Publication date: 01/01/2016
Field of study

Multiply-Add Fused (MAF) units play a key role in the processor's performance for a variety of applications. The objective of this paper is to present a multi-functional, multiple precision floating-point Multiply-Add Fused (MAF) unit. The proposed MAF is reconfigurable and able to execute a quadruple precision MAF instruction, or two double precision instructions, or four single precision instructions in parallel. The MAF architecture features a dual-path organization reducing the latency of the floating-point add (FADD) instruction and utilizes the minimum number of operating components to keep the area low. The proposed MAF design was implemented on a 65 nm silicon process achieving a maximum operating frequency of 293.5 MHz at 381 mW power. © 2015 Elsevier Ltd. All rights reserved

Pergamos : Unified Institutional Repository / Digital Library Platform of the National and Kapodistrian University of Athens

A Novel Delta-Sigma Control System Processor and Its VLSI Implementation

Author: Chouliaras V.A.
Goodall Roger M.
Nunez-Yanez J.L.
Wu Xiaofeng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/2008
Field of study

This paper describes a novel control system processor architecture based on DeltaSigma modulation known as the DeltaSigma -CSP. The DeltaSigma -CSP utilizes 1-bit processing which is a new concept in digital control applications with the direct benefit of making multi-bit multiplication operations redundant. A simple conditional-negate-and-add (CNA) unit is instead used for operations in control law implementations. For this reason, the proposed processor has a very small silicon footprint and runs at very high frequencies making it ideal for high-sampling rate, real-time control applications. A number of DeltaSigma -CSP configurations have been implemented as VLSI hard macros in a high-performance 0.13-mum CMOS process and a particular configuration achieved a post-route operating frequency of 355 MHz resulting in a 2.17 MHz sampling rate for a fourth-order control law implementation. Additional results prove that the DeltaSigma -CSP compares very favorably, in terms of silicon area and sampling rates, to two other specialized digital control processing systems, including direct, hardwired implementation of control laws; at the same time, it substantially outperforms software implementations of control laws running on very wide, general-purpose VLIW architectures

University of Huddersfield Repository

Explore Bristol Research

Customization of an embedded RISC CPU with SIMD extensions for video encoding: A case study

Author: Baumgarte
Chouliaras
Chouliaras
Chouliaras
D. Reisis
Diefendorff
Ghanbari
J.L. Nunez-Yanez
Jain
K. Manolopoulos
K. Nakos
Liu
Patterson
Po
Ramakrishna Rau
Rao
Reoxiang
S. Agha
Tham
V.A. Chouliaras
V.M. Dwyer
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Fully systolic FFT architecture for giga-sample applications

Author: Babionitakis K. Chouliaras, V.A. Manolopoulos, K. Nakos, K. Reisis, D. Vlassopoulos, N.
Publication venue
Publication date: 01/01/2010
Field of study

We present a novel 4096 complex-point, fully systolic VLSI FFT architecture based on the combination of three consecutive radix-4 stages resulting in a 64-point FFT engine. The outcome of cascading these 64-point FFT engines is an improved architecture that efficiently processes large input data sets in real time. Using 64-point FFT engines reduces the buffering and the latency to one third of a fully unfolded radix-4 architecture, while the radix-4 schema simplifies the calculations within each engine. The proposed 4096 complex point architecture has been implemented on a FPGA achieving a post-route clock frequency of 200 MHz resulting in a sustained throughput of 4096 point/20.48 μs. It has also been implemented on a high performance 0.13 μm, 1P8M CMOS process achieving a worst-case (0.9 V, 125 C) post-route clock frequency of 604.5 MHz and a sustained throughput of 4096 point/3.89 μs while consuming 4.4 W. The architecture is extended to accomplish FFT computations of 16K, 64K and 256K complex points with 352, 256 and 188 MHz operating frequencies respectively. © 2009 Springer Science+Business Media, LLC

Pergamos : Unified Institutional Repository / Digital Library Platform of the National and Kapodistrian University of Athens

Thread-parallel MPEG-2 and MPEG-4 encoders for shared-memory system-on-chip multiprocessors

Author: Chouliaras V.A. Jacobs, T.R. Núñez-Yanez, J.L. Manolopoulos, K. Nakos, K. Reisis, D.
Publication venue
Publication date: 01/01/2007
Field of study

This work focuses on speeding up MPEG-2 and MPEG-4 encoding by using thread parallelism for shared-memory, System-on-Chip (SoC) multiprocessors. Improving the performance of the MPEG encoders is shown by reducing the dynamic instruction count at multiple processor contexts and then mapping onto a configurable SoC multiprocessor. The resulting reduction in the dynamic instruction count of the parallelized MPEG-2 TM5 encoder for 32 processor contexts reaches a maximum of 95% and that of the MPEG-4 XViD a maximum of 83% for 16 processor contexts, both compared to the sequential encoder. To realize the parallelized encoders we present a configurable, N-way, extensible, bus-based, cache-coherent SoC multiprocessor, augmented with data-parallel coprocessors, and we give the VLSI implementation for the 2-way and 4-way configurations

Pergamos : Unified Institutional Repository / Digital Library Platform of the National and Kapodistrian University of Athens

Thread-Parallel MPEG-2 and MPEG-4 Encoders for Shared-Memory System-On-Chip Multiprocessors

Author: D. Reisis
J.L. Núñez-Yanez
K. Manolopoulos
K. Nakos
T.R. Jacobs
V.A. Chouliaras
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref

Customization of an embedded RISC CPU with SIMD extensions for video encoding: A case study

Author: Chouliaras V.A. Dwyer, V.M. Agha, S. Nunez-Yanez, J.L. Reisis, D. Nakos, K. Manolopoulos, K.
Publication venue
Publication date: 01/01/2008
Field of study

This work presents a detailed case study in customizing a configurable, extensible, 32-bit RISC processor with vector/SIMD instruction extensions for the efficient execution of block-based video-coding algorithms utilizing a proprietary co-design environment. In addition to the default Full-Search motion estimation of the MPEG-2 Test Model 5, fourteen fast ME algorithms were implemented in both scalar and vector form. Results demonstrate a reduction of up to 68% in the dynamic instruction count of the full search-based encoder whereas the fast motion estimation algorithms achieved a reduction in instruction count of nearly 90%, both accelerated via three 128-bit vector/SIMD instructions when compared to the scalar, reference implementation of the standard. We address in detail the profiling, vectorization and the development of these vector instruction set extensions, discuss in depth the implementation of a parametric vector accelerator that implements these instructions and show the introduction of that accelerator into a 32-bit RISC processor pipeline, in a closely-coupled configuration. © 2007 Elsevier B.V. All rights reserved

Pergamos : Unified Institutional Repository / Digital Library Platform of the National and Kapodistrian University of Athens

On the Performance Improvement of Sub-sampling MPEG-2 Motion Estimation Algorithms with Vector/SIMD Architectures

Author: J.R. Jain
J.Y. Tham
L.-M. Po
L.K. Liu
M. Ghanbari
R. Li
T. Sikora
V.A. Chouliaras
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

Crossref